"Our objective in this exploratory data analysis is to gain insights into the videos that are currently popular on YouTube. We will analyze the trends and patterns of the videos that have been identified as 'trending' on the platform, examining factors such as view counts, publication dates, and channel affiliations. By exploring these data points, we hope to better understand what makes a video successful on YouTube and identify potential trends that can inform content creators and marketers."
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
from plotly.offline import init_notebook_mode,iplot
init_notebook_mode(connected=False)
import seaborn as sns
import plotly.offline as pyo
from plotly.subplots import make_subplots
import warnings
# Ignore warnings
warnings.filterwarnings('ignore')df=pd.read_csv('CA_youtube_trending_data.csv')
df.head()| video_id | title | publishedAt | channelId | channelTitle | categoryId | trending_date | tags | view_count | likes | dislikes | comment_count | thumbnail_link | comments_disabled | ratings_disabled | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | KX06ksuS6Xo | Diljit Dosanjh: CLASH (Official) Music Video |... | 2020-08-11T07:30:02Z | UCZRdNleCgW-BGUJf-bbjzQg | Diljit Dosanjh | 10 | 2020-08-12T00:00:00Z | clash diljit dosanjh|diljit dosanjh|diljit dos... | 9140911 | 296541 | 6180 | 30059 | https://i.ytimg.com/vi/KX06ksuS6Xo/default.jpg | False | False | CLASH official music video performed by DILJIT... |
| 1 | J78aPJ3VyNs | I left youtube for a month and THIS is what ha... | 2020-08-11T16:34:06Z | UCYzPXprvl5Y-Sf0g4vX-m6g | jacksepticeye | 24 | 2020-08-12T00:00:00Z | jacksepticeye|funny|funny meme|memes|jacksepti... | 2038853 | 353797 | 2628 | 40222 | https://i.ytimg.com/vi/J78aPJ3VyNs/default.jpg | False | False | I left youtube for a month and this is what ha... |
| 2 | M9Pmf9AB4Mo | Apex Legends | Stories from the Outlands โ โTh... | 2020-08-11T17:00:10Z | UC0ZV6M2THA81QT9hrVWJG3A | Apex Legends | 20 | 2020-08-12T00:00:00Z | Apex Legends|Apex Legends characters|new Apex ... | 2381688 | 146740 | 2794 | 16549 | https://i.ytimg.com/vi/M9Pmf9AB4Mo/default.jpg | False | False | While running her own modding shop, Ramya Pare... |
| 3 | 3C66w5Z0ixs | I ASKED HER TO BE MY GIRLFRIEND... | 2020-08-11T19:20:14Z | UCvtRTOMP2TqYqu51xNrqAzg | Brawadis | 22 | 2020-08-12T00:00:00Z | brawadis|prank|basketball|skits|ghost|funny vi... | 1514614 | 156914 | 5857 | 35331 | https://i.ytimg.com/vi/3C66w5Z0ixs/default.jpg | False | False | SUBSCRIBE to BRAWADIS โถ http://bit.ly/Subscrib... |
| 4 | VIUo6yapDbc | Ultimate DIY Home Movie Theater for The LaBran... | 2020-08-11T15:10:05Z | UCDVPcEbVLQgLZX0Rt6jo34A | Mr. Kate | 26 | 2020-08-12T00:00:00Z | The LaBrant Family|DIY|Interior Design|Makeove... | 1123889 | 45803 | 964 | 2198 | https://i.ytimg.com/vi/VIUo6yapDbc/default.jpg | False | False | Transforming The LaBrant Family's empty white ... |
#loading the category dataset into dataframe to extract the category names, as it is in json format
df1=pd.read_json('CA_category_id.json')
df1| kind | etag | items | |
|---|---|---|---|
| 0 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'IfW... |
| 1 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '5XG... |
| 2 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'HCj... |
| 3 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'ra8... |
| 4 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '7mq... |
| 5 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '0Z6... |
| 6 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'K_-... |
| 7 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'I3I... |
| 8 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'D1W... |
| 9 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'QME... |
| 10 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'v2n... |
| 11 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'Qi1... |
| 12 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'IbG... |
| 13 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'gYz... |
| 14 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'hHU... |
| 15 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'KEd... |
| 16 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'tMf... |
| 17 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'tot... |
| 18 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'LNg... |
| 19 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'har... |
| 20 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'M6Y... |
| 21 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'ZFb... |
| 22 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'FD7... |
| 23 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '7fv... |
| 24 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'H6d... |
| 25 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'Z3y... |
| 26 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '3F8... |
| 27 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'Hwu... |
| 28 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': 'qJ2... |
| 29 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '2sK... |
| 30 | youtube#videoCategoryListResponse | kBCr3I9kLHHU79W4Ip5196LDptI | {'kind': 'youtube#videoCategory', 'etag': '3Ia... |
Extracting the Category names from json file
# create an empty list to store categories
categories = []
# iterate through each item in the 'items' column of the original DataFrame,
#'enumerate' is used to keep track of the index values of each item as the code loops through the 'items' column.
for i, item in enumerate(df1['items']): # "for i, item in df1['items'].iteritems(): can also be used instead of enumerate"
# extract the category name
category = item['snippet']['title']
# append the category name and its corresponding ID (i) to the list
categories.append({'categoryId': i, 'category_name': category})
# create a new DataFrame from the list of categories
df_categories = pd.DataFrame(categories)
# print the new DataFrame
df_categories| categoryId | category_name | |
|---|---|---|
| 0 | 0 | Film & Animation |
| 1 | 1 | Autos & Vehicles |
| 2 | 2 | Music |
| 3 | 3 | Pets & Animals |
| 4 | 4 | Sports |
| 5 | 5 | Short Movies |
| 6 | 6 | Travel & Events |
| 7 | 7 | Gaming |
| 8 | 8 | Videoblogging |
| 9 | 9 | People & Blogs |
| 10 | 10 | Comedy |
| 11 | 11 | Entertainment |
| 12 | 12 | News & Politics |
| 13 | 13 | Howto & Style |
| 14 | 14 | Education |
| 15 | 15 | Science & Technology |
| 16 | 16 | Movies |
| 17 | 17 | Anime/Animation |
| 18 | 18 | Action/Adventure |
| 19 | 19 | Classics |
| 20 | 20 | Comedy |
| 21 | 21 | Documentary |
| 22 | 22 | Drama |
| 23 | 23 | Family |
| 24 | 24 | Foreign |
| 25 | 25 | Horror |
| 26 | 26 | Sci-Fi/Fantasy |
| 27 | 27 | Thriller |
| 28 | 28 | Shorts |
| 29 | 29 | Shows |
| 30 | 30 | Trailers |
Merging the category into dataframe
data=df_categories.merge(df,on='categoryId')
data| categoryId | category_name | video_id | title | publishedAt | channelId | channelTitle | trending_date | tags | view_count | likes | dislikes | comment_count | thumbnail_link | comments_disabled | ratings_disabled | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Autos & Vehicles | 5WjcDji3xYc | Honest Trailers | Avatar: The Last Airbender | 2020-08-11T17:03:59Z | UCOpcACMWblDls9Z6GERVi1A | Screen Junkies | 2020-08-12T00:00:00Z | screenjunkies|screen junkies|honest trailers|h... | 833369 | 50183 | 1120 | 4634 | https://i.ytimg.com/vi/5WjcDji3xYc/default.jpg | False | False | โบโบSubscribe to ScreenJunkies!โบ https://fandom.... |
| 1 | 1 | Autos & Vehicles | z5l8ovbw_6M | Don't be a Tourist | 2020-08-10T21:28:49Z | UCDQBZcjYKP1J1Nu-Y0_D37Q | Tabbes | 2020-08-12T00:00:00Z | drawing|humor|storytime animation|story|slice ... | 1061892 | 117220 | 876 | 9311 | https://i.ytimg.com/vi/z5l8ovbw_6M/default.jpg | False | False | This one is for all you full time travelersEMA... |
| 2 | 1 | Autos & Vehicles | yVdH3QacEXc | Selena Gomez - This is the Year (Official Prem... | 2020-08-10T16:32:06Z | UCPNxhDvTcytIdvwXWAm43cA | Selena Gomez | 2020-08-12T00:00:00Z | Selena Gomez|David Henrie|Dixie DโAmelio|Charl... | 1523818 | 163684 | 2377 | 9845 | https://i.ytimg.com/vi/yVdH3QacEXc/default.jpg | False | False | Get your tickets here: https://thisistheyear.f... |
| 3 | 1 | Autos & Vehicles | qQ8domUSU7M | Fall Guys in a Nutshell | 2020-08-07T16:00:24Z | UCV6g95OBbVtFmN9uiJzkFqQ | CircleToonsHD | 2020-08-12T00:00:00Z | Fall Guys in a Nutshell|Fall guys|fall|guys|vi... | 1045901 | 71591 | 869 | 2734 | https://i.ytimg.com/vi/qQ8domUSU7M/default.jpg | False | False | I've never been THIS infuriated at a game THIS... |
| 4 | 1 | Autos & Vehicles | PORP0q8nThs | Getting Suspended In High School | 2020-08-07T20:52:37Z | UCRfg0SWjIHm_h95e4V8X5og | Young Don The Sauce God | 2020-08-12T00:00:00Z | young don the sauce god|animations|animated|st... | 741546 | 66330 | 523 | 4273 | https://i.ytimg.com/vi/PORP0q8nThs/default.jpg | False | False | No Risk. No Reward. Getting Suspended In High ... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 196938 | 29 | Shows | F-kvFACZ5yE | Denzel Washington Reveals the Aftermath of Wil... | 2022-04-03T14:58:54Z | UCjQbTcszB-gRhDByY9WhySw | T.D. Jakes | 2022-04-06T00:00:00Z | denzel washington interview|discovering the de... | 5037785 | 59136 | 0 | 17582 | https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg | False | False | During the 2022 International Leadership Summi... |
| 196939 | 29 | Shows | 3qBcdN4BZhM | A Ramadan Without Loneliness | Ramadan 2022 | ... | 2022-03-30T19:26:51Z | UCPLUqBXpM_YMFJltmBtuZTw | Islamic Relief Canada | 2022-04-06T00:00:00Z | orphans|orphan sponsorship|Islamic relief orph... | 82021 | 53 | 0 | 0 | https://i.ytimg.com/vi/3qBcdN4BZhM/default.jpg | True | False | This Ramadan 2022, Islamic Relief is continuin... |
| 196940 | 29 | Shows | F-kvFACZ5yE | Denzel Washington Reveals the Aftermath of Wil... | 2022-04-03T14:58:54Z | UCjQbTcszB-gRhDByY9WhySw | T.D. Jakes | 2022-04-07T00:00:00Z | denzel washington interview|discovering the de... | 5281932 | 62341 | 0 | 18241 | https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg | False | False | During the 2022 International Leadership Summi... |
| 196941 | 29 | Shows | F-kvFACZ5yE | Denzel Washington Reveals the Aftermath of Wil... | 2022-04-03T14:58:54Z | UCjQbTcszB-gRhDByY9WhySw | T.D. Jakes | 2022-04-08T00:00:00Z | denzel washington interview|discovering the de... | 5436102 | 63996 | 0 | 18418 | https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg | False | False | During the 2022 International Leadership Summi... |
| 196942 | 29 | Shows | F-kvFACZ5yE | Denzel Washington Reveals the Aftermath of Wil... | 2022-04-03T14:58:54Z | UCjQbTcszB-gRhDByY9WhySw | T.D. Jakes | 2022-04-09T00:00:00Z | denzel washington interview|discovering the de... | 5545806 | 65037 | 0 | 18514 | https://i.ytimg.com/vi/F-kvFACZ5yE/default.jpg | False | False | During the 2022 International Leadership Summi... |
196943 rows ร 17 columns
utube=data.copy()---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 utube=data.copy()
NameError: name 'data' is not defined
Checking for any null values
utube.isna().sum()categoryId 0
category_name 0
video_id 0
title 0
publishedAt 0
channelId 0
channelTitle 0
trending_date 0
tags 0
view_count 0
likes 0
dislikes 0
comment_count 0
thumbnail_link 0
comments_disabled 0
ratings_disabled 0
description 4096
dtype: int64
utube.dropna(subset=['channelTitle'],inplace=True) #drops the na in the column utube.drop(['categoryId','video_id','channelId','thumbnail_link','description'],axis=1,inplace=True) #Drop the unwanted columnsutube.rename({ 'category_name':'category',
'publishedAt':'published_at', #renames the column name into readable form
'channelTitle':'channel_title'
},axis=1,inplace=True)utube.head()| category | title | published_at | channel_title | trending_date | tags | view_count | likes | dislikes | comment_count | comments_disabled | ratings_disabled | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Autos & Vehicles | Honest Trailers | Avatar: The Last Airbender | 2020-08-11T17:03:59Z | Screen Junkies | 2020-08-12T00:00:00Z | screenjunkies|screen junkies|honest trailers|h... | 833369 | 50183 | 1120 | 4634 | False | False |
| 1 | Autos & Vehicles | Don't be a Tourist | 2020-08-10T21:28:49Z | Tabbes | 2020-08-12T00:00:00Z | drawing|humor|storytime animation|story|slice ... | 1061892 | 117220 | 876 | 9311 | False | False |
| 2 | Autos & Vehicles | Selena Gomez - This is the Year (Official Prem... | 2020-08-10T16:32:06Z | Selena Gomez | 2020-08-12T00:00:00Z | Selena Gomez|David Henrie|Dixie DโAmelio|Charl... | 1523818 | 163684 | 2377 | 9845 | False | False |
| 3 | Autos & Vehicles | Fall Guys in a Nutshell | 2020-08-07T16:00:24Z | CircleToonsHD | 2020-08-12T00:00:00Z | Fall Guys in a Nutshell|Fall guys|fall|guys|vi... | 1045901 | 71591 | 869 | 2734 | False | False |
| 4 | Autos & Vehicles | Getting Suspended In High School | 2020-08-07T20:52:37Z | Young Don The Sauce God | 2020-08-12T00:00:00Z | young don the sauce god|animations|animated|st... | 741546 | 66330 | 523 | 4273 | False | False |
utube.info()<class 'pandas.core.frame.DataFrame'>
Int64Index: 196943 entries, 0 to 196942
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 category 196943 non-null object
1 title 196943 non-null object
2 published_at 196943 non-null object
3 channel_title 196943 non-null object
4 trending_date 196943 non-null object
5 tags 196943 non-null object
6 view_count 196943 non-null int64
7 likes 196943 non-null int64
8 dislikes 196943 non-null int64
9 comment_count 196943 non-null int64
10 comments_disabled 196943 non-null bool
11 ratings_disabled 196943 non-null bool
dtypes: bool(2), int64(4), object(6)
memory usage: 16.9+ MB
It's advisable to check the datatype of each column and convert them back to their original datatypes if necessary.
utube['published_at'] = pd.to_datetime(utube['published_at']).dt.strftime('%Y-%m-%d')
utube['published_at'] = pd.to_datetime(utube['published_at'])
utube['trending_date'] = pd.to_datetime(utube['trending_date']).dt.strftime('%Y-%m-%d')
utube['trending_date'] = pd.to_datetime(utube['trending_date'])utube['publish_month']=pd.to_datetime(utube['published_at']).dt.strftime('%b')
utube['publish_day']=pd.to_datetime(utube['published_at']).dt.dayutube.head()| category | title | published_at | channel_title | trending_date | tags | view_count | likes | dislikes | comment_count | comments_disabled | ratings_disabled | publish_month | publish_day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Autos & Vehicles | Honest Trailers | Avatar: The Last Airbender | 2020-08-11 | Screen Junkies | 2020-08-12 | screenjunkies|screen junkies|honest trailers|h... | 833369 | 50183 | 1120 | 4634 | False | False | Aug | 11 |
| 1 | Autos & Vehicles | Don't be a Tourist | 2020-08-10 | Tabbes | 2020-08-12 | drawing|humor|storytime animation|story|slice ... | 1061892 | 117220 | 876 | 9311 | False | False | Aug | 10 |
| 2 | Autos & Vehicles | Selena Gomez - This is the Year (Official Prem... | 2020-08-10 | Selena Gomez | 2020-08-12 | Selena Gomez|David Henrie|Dixie DโAmelio|Charl... | 1523818 | 163684 | 2377 | 9845 | False | False | Aug | 10 |
| 3 | Autos & Vehicles | Fall Guys in a Nutshell | 2020-08-07 | CircleToonsHD | 2020-08-12 | Fall Guys in a Nutshell|Fall guys|fall|guys|vi... | 1045901 | 71591 | 869 | 2734 | False | False | Aug | 7 |
| 4 | Autos & Vehicles | Getting Suspended In High School | 2020-08-07 | Young Don The Sauce God | 2020-08-12 | young don the sauce god|animations|animated|st... | 741546 | 66330 | 523 | 4273 | False | False | Aug | 7 |
utube['publish_month'].unique()array(['Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr',
'May', 'Jun', 'Jul'], dtype=object)
utube['publish_day'].nunique()31
utube.describe().T| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| view_count | 196943.0 | 2.396671e+06 | 6.225579e+06 | 0.0 | 448551.0 | 925720.0 | 2099926.5 | 264407389.0 |
| likes | 196943.0 | 1.274638e+05 | 3.697903e+05 | 0.0 | 17868.0 | 43493.0 | 109655.5 | 16021548.0 |
| dislikes | 196943.0 | 1.592899e+03 | 9.533623e+03 | 0.0 | 0.0 | 0.0 | 787.0 | 879357.0 |
| comment_count | 196943.0 | 9.642997e+03 | 7.539491e+04 | 0.0 | 1137.0 | 2619.0 | 6257.0 | 6738536.0 |
| publish_day | 196943.0 | 1.557807e+01 | 8.783524e+00 | 1.0 | 8.0 | 15.0 | 23.0 | 31.0 |
def convert_scientific_to_decimal(value):
return round(float(value), 2)utube['view_count'] = utube['view_count'].apply(convert_scientific_to_decimal)# group the videos by category and calculate the average metrics
category_metrics = utube.groupby('category')[['view_count' ,'likes', 'dislikes', 'comment_count']].mean()
category_metrics.sort_values(by=['view_count', 'likes', 'dislikes', 'comment_count'],ascending=[False,False,False,False],inplace=True)
category_metrics=round(category_metrics)
category_metrics| view_count | likes | dislikes | comment_count | |
|---|---|---|---|---|
| category | ||||
| Shows | 3412230.0 | 174444.0 | 4253.0 | 4385.0 |
| Comedy | 2895858.0 | 181284.0 | 2000.0 | 18023.0 |
| Foreign | 2810967.0 | 145265.0 | 1636.0 | 7748.0 |
| Autos & Vehicles | 2442090.0 | 104692.0 | 929.0 | 7255.0 |
| Shorts | 2402510.0 | 99525.0 | 1421.0 | 5461.0 |
| Drama | 2273363.0 | 120202.0 | 2368.0 | 6066.0 |
| Family | 2094707.0 | 137317.0 | 1931.0 | 5491.0 |
| Thriller | 1697511.0 | 92165.0 | 729.0 | 4807.0 |
| Anime/Animation | 1624911.0 | 41009.0 | 631.0 | 3128.0 |
| Sci-Fi/Fantasy | 1480570.0 | 69848.0 | 1477.0 | 3673.0 |
| Horror | 1380213.0 | 19780.0 | 997.0 | 4295.0 |
| Science & Technology | 1378854.0 | 69046.0 | 568.0 | 3093.0 |
| Music | 1012436.0 | 46504.0 | 519.0 | 3355.0 |
| Classics | 1007103.0 | 54525.0 | 413.0 | 2729.0 |
# Create a list of metric names to plot
metric_names = ['view_count', 'likes', 'dislikes', 'comment_count']
# Loop through each metric and create a sorted bar chart
for metric in metric_names:
# Sort the DataFrame by the current metric in descending order
sorted_metrics = category_metrics.sort_values(metric, ascending=False)
# Create a Bar trace object with the sorted values
trace = go.Bar(x=sorted_metrics.index, y=sorted_metrics[metric])
# Create the Figure object
fig = go.Figure(data=[trace])
# Add title and axis labels
fig.update_layout(title='Average {} by Category'.format(metric.capitalize()),
xaxis_title='Category',
yaxis_title='Average {}'.format(metric.capitalize()))
# Show the plot
fig.show()utube.head()| category | title | published_at | channel_title | trending_date | tags | view_count | likes | dislikes | comment_count | comments_disabled | ratings_disabled | publish_month | publish_day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Autos & Vehicles | Honest Trailers | Avatar: The Last Airbender | 2020-08-11 | Screen Junkies | 2020-08-12 | screenjunkies|screen junkies|honest trailers|h... | 833369.0 | 50183 | 1120 | 4634 | False | False | Aug | 11 |
| 1 | Autos & Vehicles | Don't be a Tourist | 2020-08-10 | Tabbes | 2020-08-12 | drawing|humor|storytime animation|story|slice ... | 1061892.0 | 117220 | 876 | 9311 | False | False | Aug | 10 |
| 2 | Autos & Vehicles | Selena Gomez - This is the Year (Official Prem... | 2020-08-10 | Selena Gomez | 2020-08-12 | Selena Gomez|David Henrie|Dixie DโAmelio|Charl... | 1523818.0 | 163684 | 2377 | 9845 | False | False | Aug | 10 |
| 3 | Autos & Vehicles | Fall Guys in a Nutshell | 2020-08-07 | CircleToonsHD | 2020-08-12 | Fall Guys in a Nutshell|Fall guys|fall|guys|vi... | 1045901.0 | 71591 | 869 | 2734 | False | False | Aug | 7 |
| 4 | Autos & Vehicles | Getting Suspended In High School | 2020-08-07 | Young Don The Sauce God | 2020-08-12 | young don the sauce god|animations|animated|st... | 741546.0 | 66330 | 523 | 4273 | False | False | Aug | 7 |
grouped_data=utube.groupby('channel_title')[['view_count' ,'likes', 'dislikes', 'comment_count']].mean()
grouped_data.sort_values(by='view_count',ascending=False,inplace=True)
grouped_data.head()| view_count | likes | dislikes | comment_count | |
|---|---|---|---|---|
| channel_title | ||||
| CHANDAN ART ACADEMY | 1.153215e+08 | 6.147769e+06 | 0.000000 | 40101.166667 |
| Mv Ryhan | 8.556066e+07 | 1.410929e+06 | 83015.615385 | 4991.692308 |
| Dr.Harrsha Artist | 8.338510e+07 | 5.499444e+06 | 0.000000 | 25629.333333 |
| mingweirocks | 7.760728e+07 | 1.800318e+06 | 98090.333333 | 7192.166667 |
| FAMILY BOOMS | 7.452598e+07 | 2.566466e+06 | 105765.333333 | 15481.666667 |
fig = go.Figure()
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['view_count'], name='Views'))
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['likes'], name='Likes'))
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['dislikes'], name='Dislikes'))
fig.add_trace(go.Bar(x=grouped_data.index[:5], y=grouped_data['comment_count'], name='Comments'))
fig.update_layout(
updatemenus=[
dict(
type='dropdown',
buttons=[
dict(label='Views',
method='update',
args=[{'visible': [True, False,False,False]}]),
dict(label='Likes',
method='update',
args=[{'visible': [False, True,False,False]}]),
dict(label='Dislikes',
method='update',
args=[{'visible': [False, False,True,False]}]),
dict(label='Comments',
method='update',
args=[{'visible': [False, False,False,True]}])
],
active=0,
showactive=True
)
]
)
fig.show()# Create a correlation matrix
corr_matrix = utube[['view_count', 'likes', 'dislikes', 'comment_count']].corr()
# Create a heatmap using plotly.graph_objs
heatmap = go.Heatmap(
z=corr_matrix.values,
x=corr_matrix.index.values,
y=corr_matrix.columns.values,
colorscale="GreenS"
)
# Set the title of the plot
layout = go.Layout(
title="Correlation Matrix Heatmap",
autosize=False
)
# Create a figure and plot the heatmap
fig = go.Figure(data=[heatmap], layout=layout)
# Show the plot
fig.show()
# Create a scatter plot of likes vs dislikes
plt.scatter(x='dislikes', y='likes',data=utube)
plt.title(f'Relationship between Likes and Dislikes ')
plt.xlabel('Likes')
plt.ylabel('Dislikes')
plt.show()
# Select videos with views greater than or equal to 1 million
high_views = utube[utube['view_count'] >= 1000000]
# Combine all tags from the selected videos into a single list
all_tags = high_views['tags'].str.split('|').tolist()
all_tags = [tag for tags in all_tags for tag in tags]
# Count the occurrence of each tag
tag_counts = pd.Series(all_tags).value_counts()
print(tag_counts.head(10))[None] 16591
funny 5760
minecraft 3890
comedy 3766
challenge 2883
highlights 1617
vlog 1605
fun 1601
football 1590
tiktok 1579
dtype: int64
# Create a bar chart of the top 10 most commonly used tags
bar = go.Bar(
x=tag_counts.head(10).index,
y=tag_counts.head(10).values,
marker=dict(color=tag_counts.head(10).values, colorscale='Viridis'),
)
# Set the layout of the chart
layout = go.Layout(
title='Top 10 Most Commonly Used Tags in Videos with High Views',
xaxis=dict(title='Tag'),
yaxis=dict(title='Count'),
)
# Combine the chart and layout, and plot the chart
fig = go.Figure(data=[bar], layout=layout)
fig.show()month=utube.groupby('publish_month')['view_count','likes'].sum().sort_values(by=['view_count','likes'],ascending=[False,False])
month| view_count | likes | |
|---|---|---|
| publish_month | ||
| Dec | 4.949529e+10 | 2546164299 |
| Mar | 4.547636e+10 | 2394696620 |
| Oct | 4.520997e+10 | 2567250881 |
| Jun | 4.143550e+10 | 2064449171 |
| Aug | 3.951106e+10 | 2332750238 |
| Sep | 3.888670e+10 | 2275353118 |
| Feb | 3.729378e+10 | 1781207179 |
| Jan | 3.668217e+10 | 1927015428 |
| Apr | 3.643994e+10 | 1831073251 |
| Nov | 3.608132e+10 | 2076016412 |
| May | 3.509804e+10 | 1654724976 |
| Jul | 3.039737e+10 | 1652393100 |
fig = go.Figure()
# change the data being plotted
fig.add_trace(go.Bar(x=month.index, y=month['view_count'], name='Total Views'))
fig.add_trace(go.Bar(x=month.index, y=month['likes'], name='Total Likes'))
fig.update_layout(
# change the labels of the dropdown buttons
updatemenus=[
dict(
type='dropdown',
buttons=[
dict(label='Total Views',
method='update',
args=[{'visible': [True, False]},
{'title': 'Total Views and Likes'}]),
dict(label='Total Likes',
method='update',
args=[{'visible': [False, True]},
{'title': 'Total Views and Likes'}])
],
# change the initial button that is displayed
active=1,
showactive=True
)
]
)
fig.show()
# Group the data by channel_title and count the number of occurrences
channel_counts = utube.groupby('channel_title')['title'].count()
# Sort the channels by the number of trending videos in descending order
sorted_channels = channel_counts.sort_values(ascending=False)
# Plot the result on a horizontal bar graph
plt.barh(sorted_channels.index[:10], sorted_channels.values[:10])
plt.title('Top 10 Channels with Most Trending Videos')
plt.xlabel('Number of Trending Videos')
plt.show()
# Convert the published_at column to datetime format
utube['published_at'] = pd.to_datetime(utube['published_at'])
# Extract the day of the week from the published_at column
utube['publish_day'] = utube['published_at'].dt.day_name()
# Group the data by the publish_day column and calculate the average views and comments
avg_views_comments = utube.groupby('publish_day')[['view_count']].mean()
# Plot the result on a bar graph
avg_views_comments.plot(kind='bar')
plt.title('Average Views and Comments by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Count')
plt.show()
# Filter the data to only include videos published within the last week
one_week_ago = pd.Timestamp.now() - pd.Timedelta(days=7)
recent_videos = utube[utube['published_at'] >= one_week_ago]
# Sort the recent_videos dataframe by view count in descending order
sorted_videos = recent_videos.sort_values(by='view_count', ascending=False)
sorted_videos['trend_time']=(sorted_videos['trending_date']-sorted_videos['published_at']).dt.days
max_views_per_channel = sorted_videos.groupby('channel_title').agg({'title': 'first', 'view_count': 'max','trend_time': 'first'})
max_views_per_channel.sort_values(by=['view_count','trend_time'],ascending=[False,True],inplace=True)
max_views_per_channel| title | view_count | trend_time | |
|---|---|---|---|
| channel_title | |||
| NBA | #6 WARRIORS at #3 KINGS | FULL GAME 2 HIGHLIGH... | 2132127.0 | 1 |
| NBA on TNT | โWhat Was He Supposed To Do?โ | Inside Reacts ... | 1162124.0 | 1 |
| Skip and Shannon: UNDISPUTED | Draymond Green ejected for stomping on Sabonis... | 682492.0 | 1 |
| Babish Culinary Universe | Binging with Babish: Tyler's Bullsh*t from The... | 650329.0 | 1 |
| Bleacher Report | Draymond Ejected After STEPPING On Sabonis ๐ณ | 627056.0 | 1 |
| fantano | Frank Ocean Flopped | 528842.0 | 1 |
| NCT | NCT DOJAEJUNG ์์ํฐ ๋์ฌ์ 'Perfume' Performance Video | 514476.0 | 1 |
| Wendover Productions | How Corporate Consolidation is Killing Ski Towns | 465326.0 | 1 |
| Practical Engineering | East Palestine Train Derailment Explained | 367154.0 | 1 |
| Eddie Hall The Beast | Reunited with Brian Shaw | World's Strongest M... | 325737.0 | 1 |
| NHL | Edmonton Oilers falter in Game 1 | Kings @ Oil... | 237337.0 | 1 |
| SPORTSNET | NHL Game 1 Highlights | Kings vs. Oilers - Apr... | 233953.0 | 1 |
| Nick Viall | Freestyle with Love is Blindโs Marshall Glaze ... | 230990.0 | 1 |
| OfflineTV & Friends | gotta catch 'em all! | 193400.0 | 1 |
# Create a list of metric names to plot
graph_names = ['view_count','trend_time']
# Loop through each metric and create a sorted bar chart
for graph in graph_names:
# Create a Bar trace object with the sorted values
trace = go.Bar(x=max_views_per_channel.index[:5], y=max_views_per_channel[graph][:5])
# Create the Figure object
fig = go.Figure(data=[trace])
# Add title and axis labels
fig.update_layout(title='Most viewed video for a channel withn a short span',
xaxis_title='Channel Name',
yaxis_title='No. of {}'.format(graph.capitalize()))
# Show the plot
fig.show()